351 research outputs found
An interpretability framework for Similar case matching
Similar Case Matching (SCM) plays a pivotal role in the legal system by
facilitating the efficient identification of similar cases for legal
professionals. While previous research has primarily concentrated on enhancing
the performance of SCM models, the aspect of interpretability has been
neglected. To bridge the gap, this study proposes an integrated pipeline
framework for interpretable SCM. The framework comprises four modules: judicial
feature sentence identification, case matching, feature sentence alignment, and
conflict resolution. In contrast to current SCM methods, our framework first
extracts feature sentences within a legal case that contain essential
information. Then it conducts case matching based on these extracted features.
Subsequently, our framework aligns the corresponding sentences in two legal
cases to provide evidence of similarity. In instances where the results of case
matching and feature sentence alignment exhibit conflicts, the conflict
resolution module resolves these inconsistencies. The experimental results show
the effectiveness of our proposed framework, establishing a new benchmark for
interpretable SCM
Implicit Identity Leakage: The Stumbling Block to Improving Deepfake Detection Generalization
In this paper, we analyse the generalization ability of binary classifiers
for the task of deepfake detection. We find that the stumbling block to their
generalization is caused by the unexpected learned identity representation on
images. Termed as the Implicit Identity Leakage, this phenomenon has been
qualitatively and quantitatively verified among various DNNs. Furthermore,
based on such understanding, we propose a simple yet effective method named the
ID-unaware Deepfake Detection Model to reduce the influence of this phenomenon.
Extensive experimental results demonstrate that our method outperforms the
state-of-the-art in both in-dataset and cross-dataset evaluation. The code is
available at https://github.com/megvii-research/CADDM.Comment: Accepted by CVPR 202
Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis
Large language models (LLMs) have demonstrated remarkable potential in
handling multilingual machine translation (MMT). In this paper, we
systematically investigate the advantages and challenges of LLMs for MMT by
answering two questions: 1) How well do LLMs perform in translating a massive
number of languages? 2) Which factors affect LLMs' performance in translation?
We evaluate popular LLMs, including XGLM, OPT, BLOOMZ, and ChatGPT, on 102
languages. Our empirical results show that even the best model ChatGPT still
lags behind the supervised baseline NLLB in 83.33% of translation directions.
Through further analysis, we discover that LLMs exhibit new working patterns
when used for MMT. First, prompt semantics can surprisingly be ignored when
given in-context exemplars, where LLMs still show strong performance even with
unreasonable prompts. Second, cross-lingual exemplars can provide better task
instruction for low-resource translation than exemplars in the same language
pairs. Third, we observe the overestimated performance of BLOOMZ on dataset
Flores-101, indicating the potential risk when using public datasets for
evaluation
Extrapolating Large Language Models to Non-English by Aligning Languages
Existing large language models show disparate capability across different
languages, due to the imbalance in the training data. Their performances on
English tasks are often stronger than on tasks of other languages. In this
paper, we empower pre-trained LLMs on non-English languages by building
semantic alignment across languages. We start from targeting individual
languages by performing cross-lingual instruction-tuning (CoIT) on LLaMA, i.e.
tuning it with translation task data and cross-lingual general task data to
obtain cross-lingual models (x-LLaMAs), and formulate underlying scaling laws
to investigate the advantages of using scalable translation data. Then we
perform multilingual instruction-tuning (MuIT) with mixed resources to build
multilingual m-LLaMA. We also illustrate how we leverage the scaling laws to
optimize data allocation in a resource-constrained setting. Experiment results
on cross-lingual benchmarks XQUAD and MLQA show that x-LLaMAs surpass the
English instruction-tuned counterpart (Alpaca) by an average of 27.83% across
six non-English languages. Evaluation results on translation dataset Flores-101
show that x-LLaMAs outperform previous LLaMA-based models by an average of
18.89%. Encouragingly, m-LLaMA achieves comparable performance to x-LLaMAs on
individual languages and demonstrates the ability to follow multilingual
instructions. Further analysis on response content and representation space
reveals the alignment of the multilingual semantic space within the middle
layers of m-LLaMA
- …